Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NimBLEDevice::init() deadlock problem #584

Open
wants to merge 1 commit into
base: release/1.4
Choose a base branch
from

Conversation

CW-B-W
Copy link

@CW-B-W CW-B-W commented Aug 19, 2023

NimBLEDevice::host_task() has priority (configMAX_PRIORITIES - 4), which is defined at

xTaskCreatePinnedToCore(host_task, "nimble_host", NIMBLE_HS_STACK_SIZE,
NULL, (configMAX_PRIORITIES - 4), &host_task_h, NIMBLE_CORE);

When the priority of NimBLEDevice::init() is greater than (configMAX_PRIORITIES - 4), NimBLEDevice::init() will block NimBLEDevice::host_task() but still wait for NimBLEDevice::host_task() infinitely (waiting for m_synced to be set), resulting in the deadlock between NimBLEDevice::init() and NimBLEDevice::host_task().

This is caused by this while loop

while(!m_synced){
taskYIELD();
}

When the priority of NimBLEDevice::init() is greater than NimBLEDevice::host_task(), it cannot yield and let NimBLEDevice::host_task() be executed.

Using a binary semaphore can resolve this problem.

When the priority of NimBLEDevice::init() is greater than
(tskIDLE_PRIORITY+1), NimBLEDevice::init() will block NimBLE host
task and wait for NimBLE host task infinitely
@h2zero
Copy link
Owner

h2zero commented Aug 24, 2023

I've never heard of or experienced this issue before, do you have any examples where this can be reproduced?

@CW-B-W
Copy link
Author

CW-B-W commented Aug 24, 2023

Hello @h2zero,

Here is the minimum reproducible example (it tested it with bc333cc),
In this example, it will never print Initialized.
It got stuck inside NimBLEDevice::init(), and the NimBLEDevice was never sucessfully initialized.
And after a few seconds the system will be reset by watchdog.

#include <NimBLEDevice.h>

void InitNimbleTask(void* args)
{
    Serial.println("Initializing NimBLEDevice");
    NimBLEDevice::init("NimBLE-Arduino");
    Serial.println("Initialized");

    NimBLEDevice::startAdvertising();

    vTaskDelete(NULL);
}

void setup() {
    Serial.begin(115200);

    // host_task has priority (configMAX_PRIORITIES - 4)
    const uint32_t priority = (configMAX_PRIORITIES - 3); 

    // host_task is pinned at core 0
    xTaskCreatePinnedToCore(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL, 0); 
}


void loop() {

}

As stated in the PR, I think it's because host_task has priority (configMAX_PRIORITIES - 4)

xTaskCreatePinnedToCore(host_task, "nimble_host", NIMBLE_HS_STACK_SIZE,
NULL, (configMAX_PRIORITIES - 4), &host_task_h, NIMBLE_CORE);

When NimBLEDevice::init() has priority greater than (configMAX_PRIORITIES - 4), the host_task starves.

And if we set the priority of InitNimbleTask to (configMAX_PRIORITIES - 4) or lower, it works again.


Some even trickier situation,

  1. If we add a print in the problematic loop

    while(!m_synced){
    taskYIELD();
    }

    while(!m_synced){
        Serial.println("YIELD");
        taskYIELD();
    }

    It works again!
    This is because Serial.println() make the NimBLEDevice::init() task go to block state, which gives host_task some time to be executed.

  2. If we don't pin InitNimbleTask at core 0
    If we created the task with xTaskCreate instead of xTaskCreatePinnedToCore, it magically works.

    const uint32_t priority = (configMAX_PRIORITIES - 3);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);

    But if we change the priority to (configMAX_PRIORITIES - 2) or (configMAX_PRIORITIES - 1), it gets stuck again.

    const uint32_t priority = (configMAX_PRIORITIES - 2);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);

    I don't really know the reason behind, but I guess maybe it's because ESP32 has 2 cores, so the host_task may not be starved.

@h2zero
Copy link
Owner

h2zero commented Aug 25, 2023

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

@CW-B-W
Copy link
Author

CW-B-W commented Aug 26, 2023

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

Actually, I don't really need the priority of application be higher, I just accidentally found this problem when I set different task priority for my application.
I was porting this project to an MCU (other than Arduino, ESP32), and it uses polling to read user command from UART, one of the UART command is to turn BLE on.
To make the UART command handler be as responsive as possible, I made its priority the highest configMAX_PRIORITIES - 1, then I found when I launch NimBLEDevice::init() inside this command handler, which has the highest priority, it gets stuck.

@h2zero
Copy link
Owner

h2zero commented Nov 25, 2023

@CW-B-W That is a great explanation, thank you for this. I have tested your example and propose a different fix.

Instead of the semaphore, could you just replace the taskYIELD() in the while loop with ble_npl_time_delay(1) instead, that should resolve the issue as well without the extra memory requirements.

@i-am-shodan
Copy link

i-am-shodan commented Aug 6, 2024

Came here to say I've also independently found this issue

@i-am-shodan
Copy link

FYI I tried to use ble_npl_time_delay(1) I even went so far as making m_synced volatile. Doesn't work for me

@i-am-shodan
Copy link

Doesn't look like the patch works for me either. xSemaphoreTake never returns.

@i-am-shodan
Copy link

So after a lot of debugging it looks like one of my own tasks starving resources from NimBLEDevice. From the comments it looks like this lib doesn't need to create it's own task. I'd love to know how to implement this.

@h2zero
Copy link
Owner

h2zero commented Aug 10, 2024

@i-am-shodan you shouldn't need to do anything other that add a call to delay() in your own tasks/loop, this allows for a context switch.

@xgnata
Copy link

xgnata commented Sep 17, 2024

I'm also experiencing crashes due to starvation of the host task.
I have to paly with the priority of several tasks in my code to make it "work", but it is fragile.
I'm working on a reduced test case.

Is there a reason not to merge thechange proposed in #584 (comment) ?? This change looks good to me.

@h2zero
Copy link
Owner

h2zero commented Sep 21, 2024

Unless you have a very good reason to have any tasks with priority higher than the BLE host task then you should make sure that everything else is lower priority.

Is there a reason not to merge thechange proposed in #584 (comment) ?? This change looks good to me.

No reason, just haven't got to it yet 😄

@xgnata
Copy link

xgnata commented Sep 21, 2024 via email

@h2zero
Copy link
Owner

h2zero commented Sep 21, 2024

Yes, there is a lot of things going on in the BLE stack and it needs time to process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants